{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    ".. _nb_subset_selection:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Subset Selection Problem\n",
    "\n",
    "A genetic algorithm can be used to approach subset selection problems by defining custom operators. In general a metaheuristic algorithm might not be the ultimate goal to implement in a real-world scenario, however, it might be useful to investigate patterns or characteristics of possible good subsets. \n",
    "Let us consider a simple toy problem where we have to select numbers from a list. For every solution exactly 10 numbers have to be selected that their sum is minimized.\n",
    "For subset selection problem a binary encoding can be used where 1 indicates a number is picked. In our problem formulation the list of numbers is represented by $L$ and the binary encoded variable by $x$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "\\begin{align}\n",
    "\\begin{split}\n",
    "\\min f(x) & = & \\sum_{k=1}^{n} L_k \\cdot x_k\\\\[2mm]\n",
    "\\text{s.t.} \\quad g(x)  & = & (\\sum_{k=1}^{n} x_k - 10)^2\\\\[2mm]\n",
    "\\end{split}\n",
    "\\end{align}\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As shown above, the equality constraint is handled by making sure $g(x)$ can only be zero if exactly 10 numbers are chosen.\n",
    "The problem can be implemented as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "from pymoo.model.problem import Problem\n",
    "\n",
    "class SubsetProblem(Problem):\n",
    "    def __init__(self,\n",
    "                 L,\n",
    "                 n_max\n",
    "                 ):\n",
    "        super().__init__(n_var=len(L), n_obj=1, n_constr=1, elementwise_evaluation=True)\n",
    "        self.L = L\n",
    "        self.n_max = n_max\n",
    "\n",
    "    def _evaluate(self, x, out, *args, **kwargs):\n",
    "        out[\"F\"] = np.sum(self.L[x])\n",
    "        out[\"G\"] = (self.n_max - np.sum(x)) ** 2\n",
    "    \n",
    "    \n",
    "# create the actual problem to be solved\n",
    "np.random.seed(1)\n",
    "L = np.array([np.random.randint(100) for _ in range(100)])\n",
    "n_max = 10\n",
    "problem = SubsetProblem(L, n_max)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The customization requires to write custom operators in order to solve this problem efficiently. We recommend to consider the feasibility directly in the evolutionary operators, because otherwise most of the time infeasible solutions will be processed.\n",
    "The sampling creates randomly solution where the subset constraint will always be satisfied. \n",
    "The mutation randomly removes a number and chooses another one. The crossover, first takes the common numbers of both parents and then randomly picks either from the first or from the second parent until enough numbers are picked."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "from pymoo.model.crossover import Crossover\n",
    "from pymoo.model.mutation import Mutation\n",
    "from pymoo.model.sampling import Sampling\n",
    "\n",
    "\n",
    "class MySampling(Sampling):\n",
    "\n",
    "    def _do(self, problem, n_samples, **kwargs):\n",
    "        X = np.full((n_samples, problem.n_var), False, dtype=np.bool)\n",
    "\n",
    "        for k in range(n_samples):\n",
    "            I = np.random.permutation(problem.n_var)[:problem.n_max]\n",
    "            X[k, I] = True\n",
    "\n",
    "        return X\n",
    "\n",
    "\n",
    "class BinaryCrossover(Crossover):\n",
    "    def __init__(self):\n",
    "        super().__init__(2, 1)\n",
    "\n",
    "    def _do(self, problem, X, **kwargs):\n",
    "        n_parents, n_matings, n_var = X.shape\n",
    "\n",
    "        _X = np.full((self.n_offsprings, n_matings, problem.n_var), False)\n",
    "\n",
    "        for k in range(n_matings):\n",
    "            p1, p2 = X[0, k], X[1, k]\n",
    "\n",
    "            both_are_true = np.logical_and(p1, p2)\n",
    "            _X[0, k, both_are_true] = True\n",
    "\n",
    "            n_remaining = problem.n_max - np.sum(both_are_true)\n",
    "\n",
    "            I = np.where(np.logical_xor(p1, p2))[0]\n",
    "\n",
    "            S = I[np.random.permutation(len(I))][:n_remaining]\n",
    "            _X[0, k, S] = True\n",
    "\n",
    "        return _X\n",
    "\n",
    "\n",
    "class MyMutation(Mutation):\n",
    "    def _do(self, problem, X, **kwargs):\n",
    "        for i in range(X.shape[0]):\n",
    "            X[i, :] = X[i, :]\n",
    "            is_false = np.where(np.logical_not(X[i, :]))[0]\n",
    "            is_true = np.where(X[i, :])[0]\n",
    "            X[i, np.random.choice(is_false)] = True\n",
    "            X[i, np.random.choice(is_true)] = False\n",
    "\n",
    "        return X"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "After having defined the operators a genetic algorithm can be initialized."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "===========================================================================\n",
      "n_gen |  n_eval |   cv (min)   |   cv (avg)   |     fopt     |     favg    \n",
      "===========================================================================\n",
      "    1 |     100 |  0.00000E+00 |  0.00000E+00 |          258 |  4.43940E+02\n",
      "    2 |     200 |  0.00000E+00 |  0.00000E+00 |          205 |  3.54340E+02\n",
      "    3 |     300 |  0.00000E+00 |  0.00000E+00 |          175 |  3.00390E+02\n",
      "    4 |     400 |  0.00000E+00 |  0.00000E+00 |          161 |  2.52240E+02\n",
      "    5 |     500 |  0.00000E+00 |  0.00000E+00 |          124 |  2.18240E+02\n",
      "    6 |     600 |  0.00000E+00 |  0.00000E+00 |          118 |  1.91860E+02\n",
      "    7 |     700 |  0.00000E+00 |  0.00000E+00 |          118 |  1.70050E+02\n",
      "    8 |     800 |  0.00000E+00 |  0.00000E+00 |           85 |  1.49630E+02\n",
      "    9 |     900 |  0.00000E+00 |  0.00000E+00 |           85 |  1.39550E+02\n",
      "   10 |    1000 |  0.00000E+00 |  0.00000E+00 |           85 |  1.30580E+02\n",
      "   11 |    1100 |  0.00000E+00 |  0.00000E+00 |           74 |  1.19930E+02\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "   12 |    1200 |  0.00000E+00 |  0.00000E+00 |           74 |  1.11800E+02\n",
      "   13 |    1300 |  0.00000E+00 |  0.00000E+00 |           74 |  1.06050E+02\n",
      "   14 |    1400 |  0.00000E+00 |  0.00000E+00 |           68 |  1.00100E+02\n",
      "   15 |    1500 |  0.00000E+00 |  0.00000E+00 |           68 |  9.46300E+01\n",
      "   16 |    1600 |  0.00000E+00 |  0.00000E+00 |           52 |  8.97600E+01\n",
      "   17 |    1700 |  0.00000E+00 |  0.00000E+00 |           50 |  8.45100E+01\n",
      "   18 |    1800 |  0.00000E+00 |  0.00000E+00 |           50 |  8.09000E+01\n",
      "   19 |    1900 |  0.00000E+00 |  0.00000E+00 |           50 |  7.66900E+01\n",
      "   20 |    2000 |  0.00000E+00 |  0.00000E+00 |           50 |  7.33700E+01\n",
      "   21 |    2100 |  0.00000E+00 |  0.00000E+00 |           45 |  7.00100E+01\n",
      "   22 |    2200 |  0.00000E+00 |  0.00000E+00 |           45 |  6.78900E+01\n",
      "   23 |    2300 |  0.00000E+00 |  0.00000E+00 |           45 |  6.60300E+01\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "   24 |    2400 |  0.00000E+00 |  0.00000E+00 |           45 |  6.43500E+01\n",
      "   25 |    2500 |  0.00000E+00 |  0.00000E+00 |           45 |  6.36500E+01\n",
      "   26 |    2600 |  0.00000E+00 |  0.00000E+00 |           45 |  6.16100E+01\n",
      "   27 |    2700 |  0.00000E+00 |  0.00000E+00 |           45 |  6.03300E+01\n",
      "   28 |    2800 |  0.00000E+00 |  0.00000E+00 |           43 |  5.94900E+01\n",
      "   29 |    2900 |  0.00000E+00 |  0.00000E+00 |           43 |  5.76600E+01\n",
      "   30 |    3000 |  0.00000E+00 |  0.00000E+00 |           43 |  5.65100E+01\n",
      "   31 |    3100 |  0.00000E+00 |  0.00000E+00 |           42 |  5.54700E+01\n",
      "   32 |    3200 |  0.00000E+00 |  0.00000E+00 |           42 |  5.42800E+01\n",
      "   33 |    3300 |  0.00000E+00 |  0.00000E+00 |           42 |  5.36700E+01\n",
      "   34 |    3400 |  0.00000E+00 |  0.00000E+00 |           42 |  5.27000E+01\n",
      "   35 |    3500 |  0.00000E+00 |  0.00000E+00 |           42 |  5.21800E+01\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "   36 |    3600 |  0.00000E+00 |  0.00000E+00 |           42 |  5.19700E+01\n",
      "   37 |    3700 |  0.00000E+00 |  0.00000E+00 |           39 |  5.14400E+01\n",
      "   38 |    3800 |  0.00000E+00 |  0.00000E+00 |           39 |  5.10500E+01\n",
      "   39 |    3900 |  0.00000E+00 |  0.00000E+00 |           39 |  5.08500E+01\n",
      "   40 |    4000 |  0.00000E+00 |  0.00000E+00 |           39 |  5.05500E+01\n",
      "   41 |    4100 |  0.00000E+00 |  0.00000E+00 |           39 |  5.01200E+01\n",
      "   42 |    4200 |  0.00000E+00 |  0.00000E+00 |           39 |  4.93800E+01\n",
      "   43 |    4300 |  0.00000E+00 |  0.00000E+00 |           39 |  4.88000E+01\n",
      "   44 |    4400 |  0.00000E+00 |  0.00000E+00 |           39 |  4.85700E+01\n",
      "   45 |    4500 |  0.00000E+00 |  0.00000E+00 |           39 |  4.81400E+01\n",
      "   46 |    4600 |  0.00000E+00 |  0.00000E+00 |           38 |  4.76700E+01\n",
      "   47 |    4700 |  0.00000E+00 |  0.00000E+00 |           38 |  4.71800E+01\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "   48 |    4800 |  0.00000E+00 |  0.00000E+00 |           37 |  4.66100E+01\n",
      "   49 |    4900 |  0.00000E+00 |  0.00000E+00 |           37 |  4.62700E+01\n",
      "   50 |    5000 |  0.00000E+00 |  0.00000E+00 |           37 |  4.62000E+01\n",
      "   51 |    5100 |  0.00000E+00 |  0.00000E+00 |           37 |  4.57600E+01\n",
      "   52 |    5200 |  0.00000E+00 |  0.00000E+00 |           37 |  4.56000E+01\n",
      "   53 |    5300 |  0.00000E+00 |  0.00000E+00 |           37 |  4.53400E+01\n",
      "   54 |    5400 |  0.00000E+00 |  0.00000E+00 |           36 |  4.49200E+01\n",
      "   55 |    5500 |  0.00000E+00 |  0.00000E+00 |           36 |  4.46700E+01\n",
      "   56 |    5600 |  0.00000E+00 |  0.00000E+00 |           36 |  4.45100E+01\n",
      "   57 |    5700 |  0.00000E+00 |  0.00000E+00 |           36 |  4.43800E+01\n",
      "   58 |    5800 |  0.00000E+00 |  0.00000E+00 |           36 |  4.42300E+01\n",
      "   59 |    5900 |  0.00000E+00 |  0.00000E+00 |           36 |  4.41400E+01\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "   60 |    6000 |  0.00000E+00 |  0.00000E+00 |           36 |  4.38100E+01\n",
      "Function value: 36\n",
      "Subset: [ 5  9 12 31 36 37 47 52 68 99]\n"
     ]
    }
   ],
   "source": [
    "from pymoo.algorithms.so_genetic_algorithm import GA\n",
    "from pymoo.optimize import minimize\n",
    "\n",
    "algorithm = GA(\n",
    "    pop_size=100,\n",
    "    sampling=MySampling(),\n",
    "    crossover=BinaryCrossover(),\n",
    "    mutation=MyMutation(),\n",
    "    eliminate_duplicates=True)\n",
    "\n",
    "res = minimize(problem,\n",
    "               algorithm,\n",
    "               ('n_gen', 60),\n",
    "               seed=1,\n",
    "               verbose=True)\n",
    "\n",
    "print(\"Function value: %s\" % res.F[0])\n",
    "print(\"Subset:\", np.where(res.X)[0])\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, we can compare the found subset with the optimum known simply through sorting:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimal Subset: [ 5  9 12 31 36 37 47 52 68 99]\n",
      "Optimal Function Value: 36\n"
     ]
    }
   ],
   "source": [
    "opt = np.sort(np.argsort(L)[:n_max])\n",
    "print(\"Optimal Subset:\", opt)\n",
    "print(\"Optimal Function Value: %s\" % L[opt].sum())"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
